Introduction

RcometsAnalytics supports all cohort-specific analyses of the COMETS consortium. This collaborative work is done via the COMETS harmonization group activities. For more information, see the COMETS website. This vignette demonstrates using the RcometsAnalytics R package from the command line, while
the tutorial demonstrates using RcometsAnalytics from the GUI. Documentation of the RcometsAnalytics R package can be found here manual. Each project could create their own vignette to run the analyses.

Data Input Format

The required input file should be in excel format with the following 6 sheets:

  1. Metabolites - from harmonized metabolites output
  2. SubjectMetabolites - abundances in columns and subject in rows
  3. SubjectData - other exposure and adjustment variables
  4. VarMap - maps the variables needed to conduct the cohort specific analysis. Specify the name of variables under CohortVariable column.
  5. Models - models used to conduct COMETS analysis. Outcomes, exposures and adjustment can specify multiple covariates delimited by spaces (ie: age bmi).
  6. ModelOptions - sheet containing model specific options

Complete documentation of the various sheets can be found in the package documentation. An example input file is available HERE.

Missing values

Only empty cells in any excel sheet become missing values when the R software reads the sheet into a data frame. Any non-numeric value in a cell for a continuous variable will result in an error.

Example Workflows for different analyses

1. Load Data

The first step of the analysis is to load in the data with the readCOMETSinput() function.
Input for this function is an Excel spreadsheet, per the description above.

# Retrieve the full path of the input data
dir <- system.file("extdata", package="RcometsAnalytics", mustWork=TRUE)
csvfile <- file.path(dir, "cometsInputAge.xlsx")

# Read in and process the input data
exmetabdata <- RcometsAnalytics::readCOMETSinput(csvfile)
## VarMap sheet is read in.
## Metabolites sheet is read in.
## SubjectMetabolites sheet is read in.
## SubjectData sheet is read in.
## Models sheet is read in.
## Model_Types sheet is read in.
## There are 16 categorical variables.
## Running Integrity Check...
## Joining, by = "hmdb_id"
## Begin testing models in Models sheet... 
## Filtering subjects according to the rule(s) age< 70. 836 of 1000 are retained.
## Warning in runModel.addRemVars(rem.obj, vars[oneVal], vars.type, "too few unique
## non-missing values", : The variable(s) female, fasted have been removed from
## adjvars because of: too few unique non-missing values
## Warning in runModel.addRemVars(rem.obj, tmp[rem], varSet, "correlated with
## another predictor", : The variable(s) multivitamin.2 have been removed from
## adjvars because of: correlated with another predictor
## Finished testing models in Models sheet.

To plot some the distribution of variances for each metabolite:

RcometsAnalytics::plotVar(exmetabdata,titlesize=12)
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

To plot the distribution of minimum/missing values:

RcometsAnalytics::plotMinvalues(exmetabdata,titlesize=12)
## Warning: The titlefont attribute is deprecated. Use title = list(font = ...)
## instead.

2. Get Model Data

There are 2 ways to specify your model, batch or interactive. In Batch mode, models are specified in your input file Models sheet. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run. The following call defines the “1 Gender adjusted” model from the Models sheet in the input file to be run.

exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modlabel="1 Gender adjusted")

In Interactive mode, models are specified as parameters. The model information needs to be read in with the function getModelData() and processed so the software knows which models to run.
The following call defines the model with age and bmi_grp as the exposure variables, and includes only the subjects with age > 40 and bmi_grp > 2.

exmodeldata2 <- RcometsAnalytics::getModelData(exmetabdata, modelspec="Interactive",
    exposures=c("age","bmi_grp"), where=c("age>40","bmi_grp>2"))
## Filtering subjects according to the rule(s) age>40 AND bmi_grp>2. 279 of 1000 are retained.

3. Run Simple Unstratified Correlation Analysis

The runModel() function is the main function for running a single model, and by default, a correlation analysis is performed. The string “DPP” is a label for the cohort which will appear under the “cohort” column in the output.

excorrdata  <- RcometsAnalytics::runModel(exmodeldata2,exmetabdata,"DPP")

The output of the correlation analysis can then be compiled and output to an Excel file with the following function:

RcometsAnalytics::OutputListToExcel(filename="DPP_corr.xlsx", excorrdata)

To view the first 3 lines of the correlation analysis output, simply type:

RcometsAnalytics::showModel(excorrdata,nlines=3)
## 
## ModelSummary:
##   run                   outcomespec exposurespec term nobs message adjvars
## 1   1 _1_2_3_benzenetriol_sulfate_2          age       279                
## 2   2      _1_2_dipalmitoylglycerol          age       279                
## 3   3              _1_2_propanediol          age       279                
##   adjvars.removed adjspec   outcome_uid                        outcome
## 1                         CHEM100006374 1,2,3-benzenetriol sulfate (2)
## 2                             HMDB07098              DG(16:0/16:0/0:0)
## 3                             HMDB01881               Propylene glycol
##   exposure_uid     exposure adj_uid                metabolite_name
## 1          age Age at Entry         1,2,3-benzenetriol sulfate (2)
## 2          age Age at Entry                1,2-dipalmitoylglycerol
## 3          age Age at Entry                        1,2-propanediol
## 
## Effects:
##   run                   outcomespec exposurespec term    estimate      pvalue
## 1   1 _1_2_3_benzenetriol_sulfate_2          age  age 0.164624501 0.005846722
## 2   2      _1_2_dipalmitoylglycerol          age  age 0.068903188 0.251337451
## 3   3              _1_2_propanediol          age  age 0.001667259 0.977882521
##                  metabolite_name
## 1 1,2,3-benzenetriol sulfate (2)
## 2        1,2-dipalmitoylglycerol
## 3                1,2-propanediol
## 
## Errors_Warnings:
## [1] type    object  message
## <0 rows> (or 0-length row.names)
## 
## Table1:
##   variable in.model        type category   n n.unique min quartile1 median
## 1      age exposure  continuous          279       20  55        59     62
## 2  bmi_grp exposure categorical        3 269       NA  NA        NA     NA
## 3                                      4  10       NA  NA        NA     NA
##       mean quartile3 max n.missing
## 1 62.68459        66  74         0
## 2       NA        NA  NA        NA
## 3       NA        NA  NA        NA
## 
## Info:
##                       name               value
## 1                     date 2022-10-21 13:10:02
## 2                   cohort                 DPP
## 3 RcometsAnalytics version             2.0.6.0
## NULL

To display the heatmap of the resulting correlation matrix, use the showheatmap function.

RcometsAnalytics::showHeatmap(excorrdata)




To display the hierarchical clustering of the resulting correlation matrix, use the showHClust function. This diplay requires at least 2 rows and 2 columns in the correlation matrix.

exmodeldata<-RcometsAnalytics::getModelData(exmetabdata,modelspec = "Interactive",exposures = c("bmi_grp","age"))
excorrdata  <- RcometsAnalytics::runModel(exmodeldata,exmetabdata,"DPP")
RcometsAnalytics::showHClust(excorrdata, showticklabels=FALSE)

Results can be written to an output Excel file with the following command:

RcometsAnalytics::OutputListToExcel("Model1.xlsx", excorrdata)

4. Run Stratified Correlation Analysis

A stratified correlation analysis can be performed by specifiying stratification variables in the call to getModelData(). If more than one stratification variable is specified, then the strata will be defined by all unique combinations of the stratification variables. The following call will define a model stratified by race_grp.

  exmodeldata2 <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive",
                   outcomes=c("lactose","lactate"),
                exposures=c("age","bmi_grp"),strvars="race_grp")

The stratified correlation analysis is run by calling the runModel() function.

  excorrdata2  <- RcometsAnalytics::runModel(exmodeldata2,exmetabdata,"DPP")

5. Linear regression with lm

Call getModelData() to define a model which adjusts for age group, has lactose and lactate as outcome variables, and has age and bmi group as the exposure variables.

  exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
                   outcomes=c("lactose","lactate"), exposures=c("age","bmi_grp"))

To run a linear regression using the lm function, a list of options must be passed into runModel() with the model option set to “lm”.

  lm_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=list(model="lm"))
  print(lm_results)
## $ModelSummary
##   run outcomespec exposurespec    term wald.pvalue nobs message
## 1   1     lactose          age     age 0.164512670 1000        
## 2   1     lactose          age age_grp 0.202526776 1000        
## 3   2     lactate          age     age 0.601499033 1000        
## 4   2     lactate          age age_grp 0.983098762 1000        
## 5   3     lactose      bmi_grp bmi_grp 0.023879815 1000        
## 6   3     lactose      bmi_grp age_grp 0.000878276 1000        
## 7   4     lactate      bmi_grp bmi_grp 0.001662820 1000        
## 8   4     lactate      bmi_grp age_grp 0.620725533 1000        
##               adjvars adjvars.removed adjspec outcome_uid       outcome
## 1 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 2 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 3 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 4 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 5 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 6 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 7 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 8 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
##   exposure_uid     exposure             adj_uid metabolite_name
## 1          age Age at Entry age_grp.2 age_grp.3         lactose
## 2          age Age at Entry age_grp.2 age_grp.3         lactose
## 3          age Age at Entry age_grp.2 age_grp.3         lactate
## 4          age Age at Entry age_grp.2 age_grp.3         lactate
## 5      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactose
## 6      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactose
## 7      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactate
## 8      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactate
## 
## $Effects
##   run outcomespec exposurespec      term     estimate   std.error  statistic
## 1   1     lactose          age       age  0.034697534 0.024961296  1.3900534
## 2   2     lactate          age       age -0.002083854 0.003990177 -0.5222460
## 3   3     lactose      bmi_grp bmi_grp.2 -0.047905160 0.134163189 -0.3570663
## 4   3     lactose      bmi_grp bmi_grp.3  0.360610471 0.145954081  2.4707118
## 5   3     lactose      bmi_grp bmi_grp.4  0.420224133 0.577141138  0.7281133
## 6   4     lactate      bmi_grp bmi_grp.2  0.034207637 0.021367743  1.6009008
## 7   4     lactate      bmi_grp bmi_grp.3  0.080743228 0.023245640  3.4734783
## 8   4     lactate      bmi_grp bmi_grp.4 -0.126241151 0.091919426 -1.3733892
##         pvalue estimate.lower estimate.upper metabolite_name
## 1 0.1648232433   -0.014225708     0.08362078         lactose
## 2 0.6016151626   -0.009904458     0.00573675         lactate
## 3 0.7211179261   -0.310860179     0.21504986         lactose
## 4 0.0136512756    0.074545728     0.64667521         lactose
## 5 0.4667157218   -0.710951712     1.55139998         lactose
## 6 0.1097166272   -0.007672369     0.07608764         lactate
## 7 0.0005359045    0.035182610     0.12630385         lactate
## 8 0.1699410577   -0.306399914     0.05391761         lactate
## 
## $Errors_Warnings
## [1] type    object  message
## <0 rows> (or 0-length row.names)
## 
## $Table1
##   variable   in.model        type category    n n.unique min quartile1 median
## 1      age   exposure  continuous          1000       20  55        59     63
## 2  bmi_grp   exposure categorical        1  351       NA  NA        NA     NA
## 3                                        2  370       NA  NA        NA     NA
## 4                                        3  269       NA  NA        NA     NA
## 5                                        4   10       NA  NA        NA     NA
## 6  age_grp adjustment categorical        1  271       NA  NA        NA     NA
## 7                                        2  565       NA  NA        NA     NA
## 8                                        3  164       NA  NA        NA     NA
##     mean quartile3 max n.missing
## 1 63.208        67  74         0
## 2     NA        NA  NA        NA
## 3     NA        NA  NA        NA
## 4     NA        NA  NA        NA
## 5     NA        NA  NA        NA
## 6     NA        NA  NA        NA
## 7     NA        NA  NA        NA
## 8     NA        NA  NA        NA
## 
## $Info
##                        name
## 1                      date
## 2                    cohort
## 3  RcometsAnalytics version
## 4                 R version
## 5          operating system
## 6                input file
## 7                model name
## 8                  run type
## 9       op$check.cor.method
## 10      op$check.cor.cutoff
## 11       op$check.nsubjects
## 12           op$max.nstrata
## 13                 op$model
## 14        op$output.Effects
## 15   op$output.ModelSummary
## 16       op$output.ci_alpha
## 17     op$output.metab.cols
## 18           op$output.type
## 19          op$output.merge
## 20            op$chemEnrich
## 21  op$chemEnrich.adjPvalue
## 22     model.options$family
## 23                  outcome
## 24                 exposure
## 25                   strata
##                                                                              value
## 1                                                              2022-10-21 13:10:08
## 2                                                                              DPP
## 3                                                                          2.0.6.0
## 4                                                     R version 4.2.0 (2022-04-22)
## 5                                                            CentOS Linux 7 (Core)
## 6  /gpfs/gsfs10/users/wheelerwi/Rlibs/RcometsAnalytics/extdata/cometsInputAge.xlsx
## 7                                                                                 
## 8                                                                      Interactive
## 9                                                                         spearman
## 10                                                                            0.97
## 11                                                                              25
## 12                                                                              10
## 13                                                                              lm
## 14                                                                        exposure
## 15                                                                           anova
## 16                                                                            0.95
## 17                                                                 metabolite_name
## 18                                                                            xlsx
## 19                                                                            none
## 20                                                                               0
## 21                                                                            0.05
## 22                                                                                
## 23                                                                               *
## 24                                                                               *
## 25                                                                                
## 
## attr(,"ptime")
## [1] "Processing time: 0.313 sec"
## attr(,"class")
## [1] "runModel"

6. Linear regression with glm

Run a linear regression using the glm function for the same variables as above. The default family used with glm is “gaussian”, which corresponds to a linear regression. The Effects data frame will be the same as with lm, but the ModelSummary data frame will contain some different columns.

  glm_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=list(model="glm"))
  print(all.equal(lm_results$Effects, glm_results$Effects))
## [1] TRUE
  print(glm_results$ModelSummary)
##   run outcomespec exposurespec converged    term wald.pvalue nobs message
## 1   1     lactose          age         1     age 0.164512670 1000        
## 2   1     lactose          age         1 age_grp 0.202526776 1000        
## 3   2     lactate          age         1     age 0.601499033 1000        
## 4   2     lactate          age         1 age_grp 0.983098762 1000        
## 5   3     lactose      bmi_grp         1 bmi_grp 0.023879815 1000        
## 6   3     lactose      bmi_grp         1 age_grp 0.000878276 1000        
## 7   4     lactate      bmi_grp         1 bmi_grp 0.001662820 1000        
## 8   4     lactate      bmi_grp         1 age_grp 0.620725533 1000        
##               adjvars adjvars.removed adjspec outcome_uid       outcome
## 1 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 2 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 3 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 4 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 5 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 6 age_grp.2 age_grp.3                 age_grp   HMDB00186 Alpha-Lactose
## 7 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
## 8 age_grp.2 age_grp.3                 age_grp   HMDB00190 L-Lactic acid
##   exposure_uid     exposure             adj_uid metabolite_name
## 1          age Age at Entry age_grp.2 age_grp.3         lactose
## 2          age Age at Entry age_grp.2 age_grp.3         lactose
## 3          age Age at Entry age_grp.2 age_grp.3         lactate
## 4          age Age at Entry age_grp.2 age_grp.3         lactate
## 5      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactose
## 6      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactose
## 7      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactate
## 8      bmi_grp      bmi_grp age_grp.2 age_grp.3         lactate

7. Logistic regression with glm

Call getModelData() to define a model which adjusts for age group, has nested_case as the outcome variable, and has lactose and lactate as the exposure variables. The variable nested_case must be a binary 0-1 variable.

  exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
                   outcomes="nested_case", exposures=c("lactose","lactate"))

To run a logistic regression, the list of options op must also include a model.options list with family set to “binomial”.

  op <- list(model="glm", model.options=list(family="binomial"))
  glm_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
  print(glm_results)
## $ModelSummary
##   run outcomespec exposurespec converged    term wald.pvalue nobs message
## 1   1 nested_case      lactose         1 lactose  0.51856182 1000        
## 2   1 nested_case      lactose         1 age_grp  0.71229325 1000        
## 3   2 nested_case      lactate         1 lactate  0.01003264 1000        
## 4   2 nested_case      lactate         1 age_grp  0.63652620 1000        
##               adjvars adjvars.removed adjspec outcome_uid     outcome
## 1 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 2 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 3 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 4 age_grp.2 age_grp.3                 age_grp nested_case nested_case
##   exposure_uid      exposure             adj_uid metabolite_name
## 1    HMDB00186 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 2    HMDB00186 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 3    HMDB00190 L-Lactic acid age_grp.2 age_grp.3         lactate
## 4    HMDB00190 L-Lactic acid age_grp.2 age_grp.3         lactate
## 
## $Effects
##   run outcomespec exposurespec    term   estimate  std.error statistic
## 1   1 nested_case      lactose lactose 0.02268456 0.03513914 0.6455639
## 2   2 nested_case      lactate lactate 0.57214513 0.22221799 2.5747022
##       pvalue exp.estimate exp.std.error estimate.lower estimate.upper
## 1 0.51856182     1.022944    0.03594537    -0.04618689     0.09155601
## 2 0.01003264     1.772064    0.39378456     0.13660588     1.00768438
##   exp.estimate.lower exp.estimate.upper metabolite_name
## 1          0.9548635           1.095878         lactose
## 2          1.1463763           2.739251         lactate
## 
## $Errors_Warnings
## [1] type    object  message
## <0 rows> (or 0-length row.names)
## 
## $Table1
##      variable   in.model        type category    n n.outcomeEqual0
## 1 nested_case    outcome  continuous          1000             502
## 2     age_grp adjustment categorical        1  271             142
## 3                                           2  565             280
## 4                                           3  164              80
##   n.outcomeEqual1 n.unique min quartile1 median  mean quartile3 max n.missing
## 1             498        2   0         0      0 0.498         1   1         0
## 2             129       NA  NA        NA     NA    NA        NA  NA        NA
## 3             285       NA  NA        NA     NA    NA        NA  NA        NA
## 4              84       NA  NA        NA     NA    NA        NA  NA        NA
## 
## $Info
##                        name
## 1                      date
## 2                    cohort
## 3  RcometsAnalytics version
## 4                 R version
## 5          operating system
## 6                input file
## 7                model name
## 8                  run type
## 9       op$check.cor.method
## 10      op$check.cor.cutoff
## 11       op$check.nsubjects
## 12           op$max.nstrata
## 13                 op$model
## 14        op$output.Effects
## 15   op$output.ModelSummary
## 16       op$output.ci_alpha
## 17     op$output.metab.cols
## 18           op$output.type
## 19          op$output.merge
## 20            op$chemEnrich
## 21  op$chemEnrich.adjPvalue
## 22     model.options$family
## 23       model.options$link
## 24                  outcome
## 25                 exposure
## 26                   strata
##                                                                              value
## 1                                                              2022-10-21 13:10:08
## 2                                                                              DPP
## 3                                                                          2.0.6.0
## 4                                                     R version 4.2.0 (2022-04-22)
## 5                                                            CentOS Linux 7 (Core)
## 6  /gpfs/gsfs10/users/wheelerwi/Rlibs/RcometsAnalytics/extdata/cometsInputAge.xlsx
## 7                                                                                 
## 8                                                                      Interactive
## 9                                                                         spearman
## 10                                                                            0.97
## 11                                                                              25
## 12                                                                              10
## 13                                                                             glm
## 14                                                                        exposure
## 15                                                                           anova
## 16                                                                            0.95
## 17                                                                 metabolite_name
## 18                                                                            xlsx
## 19                                                                            none
## 20                                                                               0
## 21                                                                            0.05
## 22                                                                        binomial
## 23                                                                           logit
## 24                                                                     nested_case
## 25                                                                               *
## 26                                                                                
## 
## attr(,"ptime")
## [1] "Processing time: 0.285 sec"
## attr(,"class")
## [1] "runModel"

8. Poisson regression with glm

Call getModelData() to define a model which adjusts for age group, has n_visits as the outcome variable, and has lactose and lactate as the exposure variables. The variable n_visits must be a non-negative integer valued variable.

  exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
                   outcomes="n_visits", exposures=c("lactose","lactate"))

To run a Poisson regression, the list of options op must also include a model.options list with family set to “poisson”.

  op <- list(model="glm", model.options=list(family="poisson"))
  poisson_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
  print(poisson_results)
## $ModelSummary
##   run outcomespec exposurespec converged    term wald.pvalue nobs message
## 1   1    n_visits      lactose         1 lactose   0.6105400 1000        
## 2   1    n_visits      lactose         1 age_grp   0.3528081 1000        
## 3   2    n_visits      lactate         1 lactate   0.8271571 1000        
## 4   2    n_visits      lactate         1 age_grp   0.3719447 1000        
##               adjvars adjvars.removed adjspec outcome_uid  outcome exposure_uid
## 1 age_grp.2 age_grp.3                 age_grp    n_visits n_visits    HMDB00186
## 2 age_grp.2 age_grp.3                 age_grp    n_visits n_visits    HMDB00186
## 3 age_grp.2 age_grp.3                 age_grp    n_visits n_visits    HMDB00190
## 4 age_grp.2 age_grp.3                 age_grp    n_visits n_visits    HMDB00190
##        exposure             adj_uid metabolite_name
## 1 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 2 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 3 L-Lactic acid age_grp.2 age_grp.3         lactate
## 4 L-Lactic acid age_grp.2 age_grp.3         lactate
## 
## $Effects
##   run outcomespec exposurespec    term   estimate  std.error statistic
## 1   1    n_visits      lactose lactose 0.00624383 0.01225956 0.5093028
## 2   2    n_visits      lactate lactate 0.01679890 0.07693599 0.2183491
##      pvalue estimate.lower estimate.upper metabolite_name
## 1 0.6105400    -0.01778447     0.03027213         lactose
## 2 0.8271571    -0.13399287     0.16759068         lactate
## 
## $Errors_Warnings
## [1] type    object  message
## <0 rows> (or 0-length row.names)
## 
## $Table1
##   variable   in.model        type category    n n.unique min quartile1 median
## 1 n_visits    outcome  continuous          1000        6   0         1      2
## 2  age_grp adjustment categorical        1  271       NA  NA        NA     NA
## 3                                        2  565       NA  NA        NA     NA
## 4                                        3  164       NA  NA        NA     NA
##    mean quartile3 max n.missing
## 1 2.037         3   5         0
## 2    NA        NA  NA        NA
## 3    NA        NA  NA        NA
## 4    NA        NA  NA        NA
## 
## $Info
##                        name
## 1                      date
## 2                    cohort
## 3  RcometsAnalytics version
## 4                 R version
## 5          operating system
## 6                input file
## 7                model name
## 8                  run type
## 9       op$check.cor.method
## 10      op$check.cor.cutoff
## 11       op$check.nsubjects
## 12           op$max.nstrata
## 13                 op$model
## 14        op$output.Effects
## 15   op$output.ModelSummary
## 16       op$output.ci_alpha
## 17     op$output.metab.cols
## 18           op$output.type
## 19          op$output.merge
## 20            op$chemEnrich
## 21  op$chemEnrich.adjPvalue
## 22     model.options$family
## 23       model.options$link
## 24                  outcome
## 25                 exposure
## 26                   strata
##                                                                              value
## 1                                                              2022-10-21 13:10:08
## 2                                                                              DPP
## 3                                                                          2.0.6.0
## 4                                                     R version 4.2.0 (2022-04-22)
## 5                                                            CentOS Linux 7 (Core)
## 6  /gpfs/gsfs10/users/wheelerwi/Rlibs/RcometsAnalytics/extdata/cometsInputAge.xlsx
## 7                                                                                 
## 8                                                                      Interactive
## 9                                                                         spearman
## 10                                                                            0.97
## 11                                                                              25
## 12                                                                              10
## 13                                                                             glm
## 14                                                                        exposure
## 15                                                                           anova
## 16                                                                            0.95
## 17                                                                 metabolite_name
## 18                                                                            xlsx
## 19                                                                            none
## 20                                                                               0
## 21                                                                            0.05
## 22                                                                         poisson
## 23                                                                             log
## 24                                                                        n_visits
## 25                                                                               *
## 26                                                                                
## 
## attr(,"ptime")
## [1] "Processing time: 0.294 sec"
## attr(,"class")
## [1] "runModel"

9. Cox proportional hazards regression (survival model)

Call getModelData() to define a model which adjusts for age group, has event as the outcome variable, time as the time-to-event variable, and has lactose and lactate as the exposure variables. The variable event must be binary (0-1), and variable time must be positive.

  exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
                   outcomes="event", timevar="time", exposures=c("lactose","lactate"))

To run a survival model, the list of options op must also include a model.options list with model set to “coxph”.

  op <- list(model="coxph")
  coxph_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
  print(coxph_results)
## $ModelSummary
##   run outcomespec exposurespec    term wald.pvalue nobs message
## 1   1       event      lactose lactose   0.7526633 1000        
## 2   1       event      lactose age_grp   0.3581715 1000        
## 3   2       event      lactate lactate   0.6910693 1000        
## 4   2       event      lactate age_grp   0.3754580 1000        
##               adjvars adjvars.removed adjspec outcome_uid outcome exposure_uid
## 1 age_grp.2 age_grp.3                 age_grp       event   event    HMDB00186
## 2 age_grp.2 age_grp.3                 age_grp       event   event    HMDB00186
## 3 age_grp.2 age_grp.3                 age_grp       event   event    HMDB00190
## 4 age_grp.2 age_grp.3                 age_grp       event   event    HMDB00190
##        exposure             adj_uid metabolite_name
## 1 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 2 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 3 L-Lactic acid age_grp.2 age_grp.3         lactate
## 4 L-Lactic acid age_grp.2 age_grp.3         lactate
## 
## $Effects
##   run outcomespec exposurespec    term   estimate  std.error statistic
## 1   1       event      lactose lactose 0.01104201 0.03503959 0.3151295
## 2   2       event      lactate lactate 0.08702160 0.21897494 0.3974044
##      pvalue exp.estimate exp.std.error estimate.lower estimate.upper
## 1 0.7526633     1.011103    0.03542864    -0.05763432     0.07971834
## 2 0.6910693     1.090920    0.23888419    -0.34216140     0.51620459
##   exp.estimate.lower exp.estimate.upper metabolite_name
## 1          0.9439951           1.082982         lactose
## 2          0.7102336           1.675656         lactate
## 
## $Errors_Warnings
## [1] type    object  message
## <0 rows> (or 0-length row.names)
## 
## $Table1
##   variable   in.model        type category    n n.outcomeEqual0 n.outcomeEqual1
## 1    event    outcome  continuous          1000             749             251
## 2  age_grp adjustment categorical        1  271             200              71
## 3                                        2  565             420             145
## 4                                        3  164             129              35
## 5     time       time  continuous          1000             749             251
##   n.unique min quartile1 median   mean quartile3 max n.missing
## 1        2   0         0      0  0.251      1.00   1         0
## 2       NA  NA        NA     NA     NA        NA  NA        NA
## 3       NA  NA        NA     NA     NA        NA  NA        NA
## 4       NA  NA        NA     NA     NA        NA  NA        NA
## 5       49   0        11     24 23.828     35.25  48         0
## 
## $Info
##                        name
## 1                      date
## 2                    cohort
## 3  RcometsAnalytics version
## 4                 R version
## 5          operating system
## 6                input file
## 7                model name
## 8                  run type
## 9       op$check.cor.method
## 10      op$check.cor.cutoff
## 11       op$check.nsubjects
## 12           op$max.nstrata
## 13                 op$model
## 14        op$output.Effects
## 15   op$output.ModelSummary
## 16       op$output.ci_alpha
## 17     op$output.metab.cols
## 18           op$output.type
## 19          op$output.merge
## 20            op$chemEnrich
## 21  op$chemEnrich.adjPvalue
## 22       model.options$ties
## 23     model.options$family
## 24                  outcome
## 25                 exposure
## 26                   strata
##                                                                              value
## 1                                                              2022-10-21 13:10:09
## 2                                                                              DPP
## 3                                                                          2.0.6.0
## 4                                                     R version 4.2.0 (2022-04-22)
## 5                                                            CentOS Linux 7 (Core)
## 6  /gpfs/gsfs10/users/wheelerwi/Rlibs/RcometsAnalytics/extdata/cometsInputAge.xlsx
## 7                                                                                 
## 8                                                                      Interactive
## 9                                                                         spearman
## 10                                                                            0.97
## 11                                                                              25
## 12                                                                              10
## 13                                                                           coxph
## 14                                                                        exposure
## 15                                                                           anova
## 16                                                                            0.95
## 17                                                                 metabolite_name
## 18                                                                            xlsx
## 19                                                                            none
## 20                                                                               0
## 21                                                                            0.05
## 22                                                                           efron
## 23                                                                        binomial
## 24                                                                           event
## 25                                                                               *
## 26                                                                                
## 
## attr(,"ptime")
## [1] "Processing time: 0.295 sec"
## attr(,"class")
## [1] "runModel"

10. Conditional logistic regression model

Call getModelData() to define a model which adjusts for age group, has nested_case as the outcome variable, matchedSet as the group variable, and has lactose and lactate as the exposure variables. The variable nested_case must be binary (0-1), and variable matchedSet defines the matched sets of groups in the data.

  exmodeldata <- RcometsAnalytics::getModelData(exmetabdata,modelspec="Interactive", adjvars="age_grp",
                   outcomes="nested_case", groupvar="matchedSet", exposures=c("lactose","lactate"))

To run a survival model, the list of options op must also include a model.options list with model set to “clogit”.

  op <- list(model="clogit")
  clogit_results  <- RcometsAnalytics::runModel(exmodeldata, exmetabdata, "DPP", op=op)
  print(clogit_results$ModelSummary)
##   run outcomespec exposurespec    term wald.pvalue nobs message
## 1   1 nested_case      lactose lactose  0.51379179 1000        
## 2   1 nested_case      lactose age_grp  0.68081085 1000        
## 3   2 nested_case      lactate lactate  0.01093298 1000        
## 4   2 nested_case      lactate age_grp  0.63719144 1000        
##               adjvars adjvars.removed adjspec outcome_uid     outcome
## 1 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 2 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 3 age_grp.2 age_grp.3                 age_grp nested_case nested_case
## 4 age_grp.2 age_grp.3                 age_grp nested_case nested_case
##   exposure_uid      exposure             adj_uid metabolite_name
## 1    HMDB00186 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 2    HMDB00186 Alpha-Lactose age_grp.2 age_grp.3         lactose
## 3    HMDB00190 L-Lactic acid age_grp.2 age_grp.3         lactate
## 4    HMDB00190 L-Lactic acid age_grp.2 age_grp.3         lactate
  print(clogit_results$Effects)
##   run outcomespec exposurespec    term   estimate  std.error statistic
## 1   1 nested_case      lactose lactose 0.02218697 0.03397984 0.6529449
## 2   2 nested_case      lactate lactate 0.57189296 0.22472705 2.5448337
##       pvalue exp.estimate exp.std.error estimate.lower estimate.upper
## 1 0.51379179     1.022435    0.03474218     -0.0444123     0.08878624
## 2 0.01093298     1.771617    0.39813036      0.1314360     1.01234988
##   exp.estimate.lower exp.estimate.upper metabolite_name
## 1          0.9565595           1.092847         lactose
## 2          1.1404650           2.752060         lactate

11. Run Analysis on all models defined in the input Excel sheet (‘super-batch’ mode)

All models desginated in the input file can be run with one command, and individual output Excel files or correlation results will be written in the current directory by default. The function returns a list of objects.

 allresults <- RcometsAnalytics::runAllModels(exmetabdata,writeTofile=TRUE)
sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /usr/local/intel/compilers_and_libraries_2020.2.254/linux/mkl/lib/intel64_lin/libmkl_rt.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-157             lubridate_1.8.0          webshot_0.5.3           
##   [4] RColorBrewer_1.1-3       httr_1.4.4               tools_4.2.0             
##   [7] backports_1.4.1          bslib_0.4.0              ISwR_2.0-8              
##  [10] utf8_1.2.2               R6_2.5.1                 rpart_4.1.16            
##  [13] DBI_1.1.3                lazyeval_0.2.2           colorspace_2.0-3        
##  [16] nnet_7.3-17              withr_2.5.0              mnormt_2.1.0            
##  [19] tidyselect_1.1.2         gridExtra_2.3            compiler_4.2.0          
##  [22] cli_3.4.0                TSP_1.2-1                plotly_4.10.0           
##  [25] labeling_0.4.2           sass_0.4.2               scales_1.2.1            
##  [28] psych_2.2.5              stringr_1.4.1            digest_0.6.29           
##  [31] rmarkdown_2.16           pkgconfig_2.0.3          htmltools_0.5.3         
##  [34] parallelly_1.32.1        fastmap_1.1.0            readxl_1.4.1            
##  [37] htmlwidgets_1.5.4        rlang_1.0.5              rstudioapi_0.14         
##  [40] farver_2.1.1             jquerylib_0.1.4          generics_0.1.3          
##  [43] jsonlite_1.8.0           crosstalk_1.2.0          dendextend_1.16.0       
##  [46] dplyr_1.0.10             ModelMetrics_1.2.2.2     magrittr_2.0.3          
##  [49] Matrix_1.5-1             Rcpp_1.0.9               munsell_0.5.0           
##  [52] fansi_1.0.3              viridis_0.6.2            lifecycle_1.0.2         
##  [55] stringi_1.7.8            pROC_1.18.0              yaml_2.3.5              
##  [58] MASS_7.3-57              plyr_1.8.7               recipes_1.0.1           
##  [61] grid_4.2.0               parallel_4.2.0           listenv_0.8.0           
##  [64] ppcor_1.1                lattice_0.20-45          splines_4.2.0           
##  [67] knitr_1.40               pillar_1.8.1             corpcor_1.6.10          
##  [70] future.apply_1.9.1       reshape2_1.4.4           codetools_0.2-18        
##  [73] stats4_4.2.0             glue_1.6.2               evaluate_0.16           
##  [76] data.table_1.14.2        vctrs_0.4.1              foreach_1.5.2           
##  [79] cellranger_1.1.0         gtable_0.3.1             purrr_0.3.4             
##  [82] tidyr_1.2.1              future_1.28.0            heatmaply_1.3.0         
##  [85] assertthat_0.2.1         cachem_1.0.6             ggplot2_3.3.6           
##  [88] xfun_0.33                gower_1.0.0              prodlim_2019.11.13      
##  [91] RcometsAnalytics_2.0.7.0 broom_1.0.1              class_7.3-20            
##  [94] survival_3.3-1           viridisLite_0.4.1        timeDate_4021.104       
##  [97] seriation_1.3.6          tibble_3.1.8             iterators_1.0.14        
## [100] registry_0.5-1           hardhat_1.2.0            lava_1.6.10             
## [103] globals_0.16.1           ellipsis_0.3.2           caret_6.0-93            
## [106] subselect_0.15.3         ipred_0.9-13